AITopics | processing element

Collaborating Authors

processing element

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

HW/SW Co-design of a PCM/PWM converter: a System Level Approach based in the SpecC Methodology

Petrini, Daniel G. P., Junior, Braz Izaias da Silva

arXiv.org Artificial IntelligenceOct-28-2025

Abstract--[Original work from 2005; formatting revised in 2025, with no changes to the results.] We present a case study applying the SpecC methodology within a system-level hardware/software co-design flow to a PCM-to-PWM converter, the core of a Class-D audio amplifier . The converter was model ed and explored with SpecC methodology to derive an HW/SW partition. Using system-level estimates and fast function al simulation, we evaluated mappings that meet real-time constraint s while reducing estimated cost of an all-hardware solution and avo iding the expense of a purely software implementation on a high-en d processor . Despite the design's moderate complexity, the r esults underline the value of system-level co-design for early arc hitec-tural insight, rapid validation, and actionable cost/perf ormance trade-offs. The recent requirements of the semiconductors industry has lead the design methodologies evolution in order to fill the design gap detected by the 1999 Roadmap [1]. Today design has very high integration density and complex functionalit ies to implement arising the necessity to use a higher level of abstraction, the so-called System Level.

artificial intelligence, implementation, real time system, (17 more...)

arXiv.org Artificial Intelligence

2510.22046

Country: North America > United States > California (0.47)

Genre: Research Report (0.50)

Industry:

Semiconductors & Electronics (1.00)
Information Technology > Hardware (0.34)

Technology:

Information Technology > Software (0.69)
Information Technology > Hardware (0.68)
Information Technology > Architecture > Real Time Systems (0.54)
Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (0.34)

Add feedback

CGRA4ML: A Framework to Implement Modern Neural Networks for Scientific Edge Computing

Abarajithan, G, Ma, Zhenghua, Li, Zepeng, Koparkar, Shrideep, Munasinghe, Ravidu, Restuccia, Francesco, Kastner, Ryan

arXiv.org Artificial IntelligenceAug-28-2024

Scientific edge computing increasingly relies on hardware-accelerated neural networks to implement complex, near-sensor processing at extremely high throughputs and low latencies. Existing frameworks like HLS4ML are effective for smaller models, but struggle with larger, modern neural networks due to their requirement of spatially implementing the neural network layers and storing all weights in on-chip memory. CGRA4ML is an open-source, modular framework designed to bridge the gap between neural network model complexity and extreme performance requirements. CGRA4ML extends the capabilities of HLS4ML by allowing off-chip data storage and supporting a broader range of neural network architectures, including models like ResNet, PointNet, and transformers. Unlike HLS4ML, CGRA4ML generates SystemVerilog RTL, making it more suitable for targeting ASIC and FPGA design flows. We demonstrate the effectiveness of our framework by implementing and scaling larger models that were previously unattainable with HLS4ML, showcasing its adaptability and efficiency in handling complex computations. CGRA4ML also introduces an extensive verification framework, with a generated runtime firmware that enables its integration into different SoC platforms. CGRA4ML's minimal and modular infrastructure of Python API, SystemVerilog hardware, Tcl toolflows, and C runtime, facilitates easy integration and experimentation, allowing scientists to focus on innovation rather than the intricacies of hardware design and optimization.

cgra4ml, hl 4, neural network, (16 more...)

arXiv.org Artificial Intelligence

2408.15561

Country:

North America > United States > Illinois > Kane County > Batavia (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
Asia > Sri Lanka (0.04)

Genre: Research Report (0.64)

Industry:

Health & Medicine (0.68)
Semiconductors & Electronics (0.48)
Information Technology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

CHOSEN: Compilation to Hardware Optimization Stack for Efficient Vision Transformer Inference

Sadeghi, Mohammad Erfan, Fayyazi, Arash, Somashekar, Suhas, Pedram, Massoud

arXiv.org Artificial IntelligenceJul-17-2024

Vision Transformers (ViTs) represent a groundbreaking shift in machine learning approaches to computer vision. Unlike traditional approaches, ViTs employ the self-attention mechanism, which has been widely used in natural language processing, to analyze image patches. Despite their advantages in modeling visual tasks, deploying ViTs on hardware platforms, notably Field-Programmable Gate Arrays (FPGAs), introduces considerable challenges. These challenges stem primarily from the non-linear calculations and high computational and memory demands of ViTs. This paper introduces CHOSEN, a software-hardware co-design framework to address these challenges and offer an automated framework for ViT deployment on the FPGAs in order to maximize performance. Our framework is built upon three fundamental contributions: multi-kernel design to maximize the bandwidth, mainly targeting benefits of multi DDR memory banks, approximate non-linear functions that exhibit minimal accuracy degradation, and efficient use of available logic blocks on the FPGA, and efficient compiler to maximize the performance and memory-efficiency of the computing kernels by presenting a novel algorithm for design space exploration to find optimal hardware configuration that achieves optimal throughput and latency. Compared to the state-of-the-art ViT accelerators, CHOSEN achieves a 1.5x and 1.42x improvement in the throughput on the DeiT-S and DeiT-B models.

configuration, fpga, opération, (14 more...)

arXiv.org Artificial Intelligence

2407.12736

Country: North America > United States > California > Los Angeles County > Los Angeles (0.29)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Flex-TPU: A Flexible TPU with Runtime Reconfigurable Dataflow Architecture

Elbtity, Mohammed, Chandarana, Peyton, Zand, Ramtin

arXiv.org Artificial IntelligenceJul-11-2024

Tensor processing units (TPUs) are one of the most well-known machine learning (ML) accelerators utilized at large scale in data centers as well as in tiny ML applications. TPUs offer several improvements and advantages over conventional ML accelerators, like graphical processing units (GPUs), being designed specifically to perform the multiply-accumulate (MAC) operations required in the matrix-matrix and matrix-vector multiplies extensively present throughout the execution of deep neural networks (DNNs). Such improvements include maximizing data reuse and minimizing data transfer by leveraging the temporal dataflow paradigms provided by the systolic array architecture. While this design provides a significant performance benefit, the current implementations are restricted to a single dataflow consisting of either input, output, or weight stationary architectures. This can limit the achievable performance of DNN inference and reduce the utilization of compute units. Therefore, the work herein consists of developing a reconfigurable dataflow TPU, called the Flex-TPU, which can dynamically change the dataflow per layer during run-time. Our experiments thoroughly test the viability of the Flex-TPU comparing it to conventional TPU designs across multiple well-known ML workloads. The results show that our Flex-TPU design achieves a significant performance increase of up to 2.75x compared to conventional TPU, with only minor area and power overheads.

architecture, dataflow, systolic array, (16 more...)

arXiv.org Artificial Intelligence

2407.087

Country:

North America > United States > South Carolina > Richland County > Columbia (0.14)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Information Technology > Services (0.49)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

Ev-Edge: Efficient Execution of Event-based Vision Algorithms on Commodity Edge Platforms

Sridharan, Shrihari, Selvam, Surya, Roy, Kaushik, Raghunathan, Anand

arXiv.org Artificial IntelligenceMar-23-2024

Event cameras have emerged as a promising sensing modality for autonomous navigation systems, owing to their high temporal resolution, high dynamic range and negligible motion blur. To process the asynchronous temporal event streams from such sensors, recent research has shown that a mix of Artificial Neural Networks (ANNs), Spiking Neural Networks (SNNs) as well as hybrid SNN-ANN algorithms are necessary to achieve high accuracies across a range of perception tasks. However, we observe that executing such workloads on commodity edge platforms which feature heterogeneous processing elements such as CPUs, GPUs and neural accelerators results in inferior performance. This is due to the mismatch between the irregular nature of event streams and diverse characteristics of algorithms on the one hand and the underlying hardware platform on the other. We propose Ev-Edge, a framework that contains three key optimizations to boost the performance of event-based vision systems on edge platforms: (1) An Event2Sparse Frame converter directly transforms raw event streams into sparse frames, enabling the use of sparse libraries with minimal encoding overheads (2) A Dynamic Sparse Frame Aggregator merges sparse frames at runtime by trading off the temporal granularity of events and computational demand thereby improving hardware utilization (3) A Network Mapper maps concurrently executing tasks to different processing elements while also selecting layer precision by considering both compute and communication overheads. On several state-of-art networks for a range of autonomous navigation tasks, Ev-Edge achieves 1.28x-2.05x improvements in latency and 1.23x-2.15x in energy over an all-GPU implementation on the NVIDIA Jetson Xavier AGX platform for single-task execution scenarios. Ev-Edge also achieves 1.43x-1.81x latency improvements over round-robin scheduling methods in multi-task execution scenarios.

ev-edge, event frame, sparse frame, (15 more...)

arXiv.org Artificial Intelligence

2403.15717

Country: North America > United States (0.14)

Genre: Research Report (0.64)

Industry: Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.69)

Add feedback

DAISM: Digital Approximate In-SRAM Multiplier-based Accelerator for DNN Training and Inference

Sonnino, Lorenzo, Shresthamali, Shaswot, He, Yuan, Kondo, Masaaki

arXiv.org Artificial IntelligenceMay-12-2023

DNNs are one of the most widely used Deep Learning models. The matrix multiplication operations for DNNs incur significant computational costs and are bottlenecked by data movement between the memory and the processing elements. Many specialized accelerators have been proposed to optimize matrix multiplication operations. One popular idea is to use Processing-in-Memory where computations are performed by the memory storage element, thereby reducing the overhead of data movement between processor and memory. However, most PIM solutions rely either on novel memory technologies that have yet to mature or bit-serial computations which have significant performance overhead and scalability issues. In this work, an in-SRAM digital multiplier is proposed to take the best of both worlds, i.e. performing GEMM in memory but using only conventional SRAMs without the drawbacks of bit-serial computations. This allows the user to design systems with significant performance gains using existing technologies with little to no modifications. We first design a novel approximate bit-parallel multiplier that approximates multiplications with bitwise OR operations by leveraging multiple wordlines activation in the SRAM. We then propose DAISM - Digital Approximate In-SRAM Multiplier architecture, an accelerator for convolutional neural networks, based on our novel multiplier. This is followed by a comprehensive analysis of trade-offs in area, accuracy, and performance. We show that under similar design constraints, DAISM reduces energy consumption by 25\% and the number of cycles by 43\% compared to state-of-the-art baselines.

artificial intelligence, machine learning, multiplier, (18 more...)

arXiv.org Artificial Intelligence

2305.07376

Country:

Europe (0.04)
North America > United States > New York > New York County > New York City (0.04)
Asia > Middle East > Jordan (0.04)
Asia > Japan (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Training a Limited-Interconnect, Synthetic Neural IC

Neural Information Processing SystemsApr-6-2023, 19:58:08 GMT

Hardware implementation of neuromorphic algorithms is hampered by high degrees of connectivity. Functionally equivalent feedforward networks may be formed by using limited fan-in nodes and additional layers. No direct mapping of weights exists between fully and limited-interconnect nets. Low-level nonlinearities prevent the formation of internal representations of widely separated spatial features and the use of gradient descent methods to minimize output error is hampered by error magnitude dissipation. The judicious use of linear summations or collection units is proposed as a solution.

architecture, implementation, input space, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Dataflow Architectures: Flexible Platforms for Neural Network Simulation

Neural Information Processing SystemsApr-6-2023, 19:49:09 GMT

Dataflow architectures are general computation engines optimized for the execution of fme-grain parallel algorithms. Neural networks can be simulated on these systems with certain advantages. In this paper, we review dataflow architectures, examine neural network simulation performance on a new generation dataflow machine, compare that performance to other simulation alternatives, and discuss the benefits and drawbacks of the dataflow approach. Dataflow architectures are general computation engines that treat each instruction of a program as a separate task which is scheduled in an asynchronous, data-driven fashion. Dataflow programs are compiled into graphs which explicitly describe the data dependencies of the computation.

dataflow architecture, flexible platform, neural network simulation, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.92)
Information Technology > Communications > Networks (0.68)

Add feedback

Qualcomm Snapdragon 8 Gen 2 Delivers More AI For Mobile

#artificialintelligenceNov-16-2022, 10:25:34 GMT

The Snapdragon Tech Summit is a multi-day event that showcases the latest mobile technology Qualcomm has to offer. This is the second year that Qualcomm has held simultaneous events in China and Hawaii, as well as streaming the keynote addresses. Day 1 of the Snapdragon Tech Summit kicked off with the introduction of the latest smartphone system-on-chip (SoC) for smartphones – the Snapdragon 8 Gen 2. As expected, it delivers improvements in performance and efficiency for camera, connectivity, gaming, sound, and security. But the biggest punch comes from the use of artificial intelligence (AI) in just about every area. The company went so far as to call it "purpose built for AI." Qualcomm uses all of the Snapdragon SoC's processing elements for AI processing and calls the combination of these processing elements the "AI engine."

gen 2, qualcomm snapdragon 8, snapdragon 8, (15 more...)

#artificialintelligence

Country:

North America > United States > Hawaii (0.25)
Asia > China (0.25)

Industry:

Telecommunications (1.00)
Semiconductors & Electronics (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.73)
Information Technology > Communications > Mobile (0.59)

Add feedback

Security and Safety Aspects of AI in Industry Applications

Doran, Hans Dermot

arXiv.org Artificial IntelligenceJul-16-2022

In this relatively informal discussion-paper we summarise issues in the domains of safety and security in machine learning that will affect industry sectors in the next five to ten years. Various products using neural network classification, most often in vision related applications but also in predictive maintenance, have been researched and applied in real-world applications in recent years. Nevertheless, reports of underlying problems in both safety and security related domains, for instance adversarial attacks have unsettled early adopters and are threatening to hinder wider scale adoption of this technology. The problem for real-world applicability lies in being able to assess the risk of applying these technologies. In this discussion-paper we describe the process of arriving at a machine-learnt neural network classifier pointing out safety and security vulnerabilities in that workflow, citing relevant research where appropriate.

architecture, arxiv, integrity, (15 more...)

arXiv.org Artificial Intelligence

2207.10809

Country:

North America > United States > Washington > King County > Seattle (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Nevada > Clark County > Las Vegas (0.04)
(2 more...)

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback